NONCOMMUTATIVE BENNETT AND ROSENTHAL INEQUALITIES 



MARIUS JUNGE AND QIANG ZENG 



Abstract. In this paper we extend Bennett's and Bernstein's inequality to the non- 
commutative setting. In addition we provide an improved version of the noncommutative 
Rosenthal inequality, essentially due to Nagaev, Pinelis, and Pinelis, Utev for commu- 
tative random variables. We also present new best constants in Rosenthal's inequality. 
Applying these results to random Fourier projections, we recover and elaborate on fun- 
damental results from compressed sensing, due to Candes, Romberg, and Tao. 



0. Introduction 

Rosenthal's inequality [31] was initially discovered to construct some new Banach spaces. 
However, Rosenthal's inequality gives a very nice bound for the p-norm of independent 
random variables, and has found many generalizations and applications. The martingale 
version of Rosenthal's inequality was discovered almost simultaneously by Burkholder [I]. 
Since then the order of the constant in these inequalities have been studied extensively, 
in particular by Johnson, Schechtman and Zinn [T7j. The correct order in the martingale 
version has been established by Hitczenko [16] . based on fundamental work of Kwapieh 
and Woyczyhski |23j . Nowadays, easy proofs of Rosenthal inequalities can be found with 
the help of Bennett and Bernstein's inequalities, see [3] and the references therein. We 
will extend Bennett's inequalities to the noncommutative setting. Let us recall that the 
classical Rosenthal inequality says that for independent mean random variables we have 



(o.i) (nJ2h\ p ) 1/p < c(p) ((J2nf k \ 2 ) 1/2 +(J2nfk\ p ^ 

fc=i ' V fc=i ' fc=i 



2010 Mathematics Subject Classification. Primary 46L53; Secondary 46L50, 60E15, 60F10, 94A12. 

Key words and phrases. (Noncommutative) Bennett inequality, (noncommutative) Rosenthal inequal- 
ity, (noncommutative) Bernstein inequality, noncommutative L p spaces, compressed sensing, large devia- 
tion, Cramer's theorem. 

The first author was Partially supported by NSF Grant DMS-0901457. 

1 



2 



MARIUS JUNGE AND QIANG ZENG 



According to [IT], the order of the best constant here is c(p) = p/(l + logp). In this paper 
we separate the two terms and ask for 



(0.2) (E\J2 fk\ P ) 1/P < A(p)(j2 E \fk\ 2 ) 1/2 + B(p)^E\f k \^ 1/P 

k=l k=l k=l 

The central limit theorem immediately implies A(p) > c^/p for every choice of B(p). 
The problem (10.21) is by no means new. Nagaev and Pinelis [26J obtained a very precise 
bound on the tail behaviour of S n = Ylk=i -^k which implies that (A(p), B(p)) = C(y/p,p) 
is possible (although it is not a trivial task to deduce this estimate from their original 
inequality). Pinelis and Utev showed that in some sense A{p) = C^fp and B{p) = Cp is 
best possible. In section 3, we will revisit this problem and show that assuming A(p) < 
Cp m for some m > 1 /2 we must have 

B{p) > c— ? . 

KFJ ~ 1 + logp 

This is exactly consistent with (A(p), B(p)) = C(p/(1 + \ogp),p/(l + logp)). Moreover, 
we show that the worst case is obtained for independent random selectors fk = (8k — A) 
with expectation A > 0. 

We will prove a vast generalization of f l0.2p in the noncommutative setting for condi- 
tionally independent random variables with A(jp) = c^fp and B(p) = Cp. This improves 
the corresponding results from [22J of the form A(p) = B(p) = Cp. Our new results are 
motivated by applications in compressed sensing for random selctors with matrix valued 
coefficients. More precisely, we have to consider rank one operators 

a j = [Sj(/)x i (r))]i< /jr < n 

such that |xfc(j)| < D. Then the aim is to estimate 



(0.3) 



1 n 



k 



< ? 

B(q) 

for independent selectors 5j £ {0, 1} with K5j = k/n and a projection /. As in the original 

paper [6] by Candes, Romberg, and Tao, a tempting approach is to use moment estimates, 
or equivalently estimates of the Schatten p-norm of these matrices. In fact, the improved 
Rosenthal inequality allows us to recover the famous estimates in [6]. 

Let us recall that the noncommutative L p space associated with the trace on -8(^2) is 
given by 



I x ||p 



[tr(\x\n\ 1/p = (5>(*n 1/p 
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where the singular number Sj(x) = Aj(|x|), i.e. the eigenvalues of the positive matrix 
\x\ = \fx*x. Thus a good estimate of ( 10. 3p can certainly be obtained from an estimate of 
the form 
(0.4) 

P \ yp 1/2 , N i/p 

+cp[j2\\f^f\\i 



E 



p/2 



Let us now describe the more general setup which allows us to prove results in noncom- 
mutative probability which includes all the statements above. Indeed, we assume that 
A4 is a von Neumann algebra equipped with a normal faithful tracial state r : M — >■ C, 
i.e. t(1) = 1 and r{xy) = r(yx). Then L p (A4,t) is the completion of AA with respect 
to = [t(|^| p )] 1/ ' p - It is well-known (see for example [H1I3T]) that || • || p is a norm for 
1 < p < oo. In particular, || • = || • ||. Here and in the following || • || will always 
denote the operator norm. Let Af C AA be a von Neumann subalgebra. Then there exists 
a unique conditional expectation E^- : AA — > Af such that £V(1) = 1 and 

Ejy(axb) = aEj^f{x)b a,b G Af and x G A4 . 

We say that two subalgebras Af G A, B G A4 are independent over Af if 

E N (ab) = E N (a)Ej^(b) a G A,b G B . 

In particular, we say that x, y G A4 are independent if the algebras they generate respec- 
tively are independent over C. A sequence of subalgebra A±, A n are called successively 
independent over Af if A k+ i is independent of the algebra A4(k) generated by Ai, ...,A k . 
Our noncommutative Bennett inequality reads as follows. 

Theorem 0.1. Let Af C Aj C A4 be successively independent over Af and aj G Aj be 
self- adjoint such that 

i) E N {dj) = 0; ii) E N {a)) < a]; hi) ||a^|| < My 

Then for t > 0, 




t I li/. x , I > (/ , I I < exp 

where <p(x) — (1 + x) log(l + x) — x 



t sup i=1) ... in Mj 



SU Pj=l,...,n M j 



En o 
,=i a 



Here we used l/(a) = JjdE t for the spectral projection given by the spectral decom- 
position a = J tdE t . We should mention that the key new ingredient in this theorem is 
the Golden-Thompson inequality, which has already played a crucial role in Gross' paper 
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[15] . We invite the reader to rewrite the inequality for conditionally independent copies 
Xj with a = <jj, Mj = M. Note that in the commutative context 

r(l[ ti oo)(a)) = Prob(a > t) . 

In the future we will simply take this formula as a definition. Then our Bernstein inequality 
for noncommutative random variables reads as follows. 

Corollary 0.2. Under the same hypothesis of Theorem \0.1\ 

( ~ \ I t 2 
Prob > a, > t < exp =z. ^ ^ — 

\u j v 2 E; =1 ^+itsu Pj=li ... in M, 



In the work of Ahlswede and Winter [T], and Gross [15], a similar, but different version 
of Bernstein's inequality was used. Indeed, in [1] the Bernstein inequality applies to 
independent random variables in a von Neumann algebra and gives information about 
the random spectrum. In our inequality we allow general randomness via independence, 
but obtain a slightly different conclusion. At any rate, from our version it is now rather 
standard to derive Rosenthal's inequality from Bernstein's inequality. 

Corollary 0.3. Let 2 < p < oo and aj satisfy the hypothesis of Theorem \0.1\ Then 

n 



< c 




+ p sup M. 

j=l,.. ,n 



For unbounded operators and fixed p we can prove a similar inequality. Here we have to 
make a slightly stronger assumption. Let us recall that (A,)™ =1 are fully independent over 
Af if for every subset / C {l,..,n} the algebra M.(I) generated by {J i&1 Ai is independent 
from Ai(I c ) over M . 

Theorem 0.4. Let (Ai) be fully independent over Af, 1 < p < oo, Xi G L p (Ai) with 
Ejy(xi) = 0. Then 



n 








< C max < 


Vp 




V 





(0.5) 

// moreover, p > 2.5 then 



1/2 





n 








(0.6) 




< C max < 


Vp 


( 




3=1 


p 







22 Etf(xjX* + x*Xj) 



EtfixjX* + x*Xj) 

vi=i 



P 



E 



i/ p - 



x 



3\\p 



1/2 
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According to [29] and [19], the norm of (xj) in L p (£oo) is given by inf {IHhplHhp} such 
that 



ayjb with \\yj\\oo < 1 



Clearly, the orders yfp and p in the above theorem are optimal because they are already 
optimal in commutative probability. Note that in this version Theorem 10.41 improves on 
Corollary 10.31 for p large enough. The passage from first assertion to the second follows 
from an argument in [22]. In fact, Rosenthal inequalities in the noncommutative setting 
have been successively explored in [20j|2l] and [22]. The martingale situation is completely 
settled due to the work of [32] which shows that for noncommutative martingales 



1/2 



£ll d * 



where (c4) is a sequence of martingale differences given by Ek(x) = E^ k {x) and dk = 
dk{x) — Ej,(x) — Ek-i(x) for a filtration (A4) C M.. As observed in [21] the constant Cp 
gives the correct order. 

Let us return to the situation in compressed sensing. Here we obtain the following 
result. 

Corollary 0.5. Let Xj G M be positive operator, r a normalized trace such that 

i) iiJ2?=x x 3 = l > U ) ll^'ll ^ r - 

Let Sj be independent selectors such that K5j = k/m. Then forp > 2.5 

i/p 



(0.7) 



E 



j x j -'- 



pr pr 

< C max < \l — , — 
k k 



Moreover, if tr is a trace on Af such that 

IMUooCtr) < IMU p (tr) , 

and r/k = e 2 , then, for t 2 > 2.5C 2 e and t > 2.5Cee, we have 



Prob 



3 X j 1 



(0.8) 

Here C is an absolute constant 



> te < tr(l) 



Loa(tr) 



e -* 2 /(2C 2 e) if t£ < C 
e ~t/(2Cee) H tS > C 
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These results are closely related to the matrix Bernstein inequality from Tropp's paper 
[38] and operator Bernstein inequality from [15]. Their application to problem in com- 
pressed sensing will be explained in section 4. Section 1 provides the proof of the Bennett 
and Bernstein's inequalities. An application to large deviation inequalities and how non- 
commutative gaussian random variables may violate the classical equalities are discussed 
in section 2. The improved Rosenthal inequality is proved in section 3. 



1. NONCOMMUTATIVE BENNETT'S INEQUALITY 

Let us first recall some background. For a self-adjoint operator a e Ai, we have the 
spectral decomposition a = J tdE t , where E t is the spectral measure of a. For any Borel 
set A C R, we define fi{A) := t(E(A)). Then [l is a scalar-valued spectral measure for 
a and /i(M) = 1. By the measurable functional calculus (see for example [TTJ Section 
IX.8]), there exists a *-homomorphism tt : — > M. depending on a such that for all 
feL°°(n),n(f) = f(a), and 

(1.1) r(f(a)) = J f(t)v(dt) . 

In particular, for / = l[t j0O ), we have the exponential Chebyshev inequality 

(1.2) r(l [tt00) (a)) = Prob(a>t) < e~V(e a ) . 

Our proof of Bennett's inequality relies on the following crucial result obtained in [36, 
Theorem 4], see also [2lETj and the references therein. 

Lemma 1.1 (Golden-Thompson inequality). Suppose that a,b are self-adjoint operators, 
bounded above and that a + b are essentially self- adjoint (i.e. the closure of a + b is self- 
adjoint). Then 

r(e a+b ) < r(e a/2 e b e a/2 ) . 
Furthermore, if Y(e a ) < oo or r(e b ) < oo then 

(1.3) r(e a+b ) < r{e a e b ). 

Note that if a, b G Ai are self-adjoint, the hypotheses in Lemma [1.11 are automatically 
satisfied. Therefore we have (II. 3ft . With the help of (11.211 and ( 11.31) . we can prove the 
noncommutative Bennett inequality following the commutative case given in [3]. 



NONCOMMUTATIVE BENNETT AND ROSENTHAL INEQUALITIES 

Proof of Theorem W.li (II. 2p implies for A > 0, 

(1.4) Prob (^Z a i >t\ < e -A* r ( e AE?=i«i) . 

Since (a*) are successively independent, we deduce from (II. 3p that 

15 = r(£? v (e A E& la ')^v(e Afl -)). 

Expanding, we obtain 

too/, \fc \ 00 \k 

E^ = Ef%W 
fc=0 ' / fc=0 

00 \ fc 00 \ fc 

fc=2 ' fc=2 

2 / 2 



n \ n 

Note that the function f(x) := exp(x~ 2 (e A:c — 1 — Ax)) is increasing for x > 0. It follows 
that 

^(e Aa ")<exp^(e AC -l-AC) 
where C = sup i=1 n Mi. Iterating n — 2 times, we obtain 

r ( e AELi^) < exp ^ £L|^l ( e AC _ x _ xcfj . 

This yields 

(1.6) Prob \ o< > *J < ex P (-^ + ^g|^(e AC - 1 - AC)^ . 

By differentiating we find the minimizing value A = C^ 1 log(l + £C/(^" =1 of)). Then (II .6p 
yields the assertion. □ 

It is known that Bernstein's inequality is a straightforward consequence of Bennett's 
inequality. 



Proof of Corollary 10. ,21 Since 0(x) > x 2 /(2 + 2x/3) for x > 0, the corollary follows by 
relaxing the bound in Bennett's inequality. □ 
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In the following we use Corollary 10.21 to prove Corollary 10.31 Let a G M. be positive. 
Recall that Prob(a > t) is an analog of the classical distribution function of a. In particular 
we may use it to compute the L p norm of a. 

Lemma 1.2. Assume that p > and that a G M. is positive. Then 



P 



POO 

> / t p " 1 Prob(a > t)dt . 
Jo 

Proof. According to Fubini theorem and (II. ip . 



"OO 



r(a- 



s p [i(ds) 

poo /*oo 

■■ / fi(t, oo)pt p ~ l dt = p t p ~ l Pvob(a > t)dt . 
Jo Jo 



□ 



Recall that the Gamma function is defined as T(p) = J °° e~ r r p ~ 1 dr and the incomplete 
Gamma function is defined as T(a,p) = J 00 e~ l t a ~ x dt. We need an elementary estimate 
for T(a,p). Note that for t > p> 2(at — 1), we have 



(e-V*- 1 )' 
This gives the following lemma. 

Lemma 1.3. Ifp>2a-2, then T(a,p) < 2e~ v p a ~ x . 

Proof of Corollary \0.3[ First note that symmetry and Corollary 10.21 imply 



Prob 



i=i 



>t \ < 2 exp 



2 ELl °f + ¥ SU Pl<i<n M i 

Put S = Ym=\ a i an d R = su Pi=i n Mi- By Lemma [L2| we have 
p 



£■ 



< 2p 



2p 



exp 



exp 



25 + fti? 



2S + 



t p - 1 dt + 2p / exp 

/3S 



2S + \tR 



t p ^dt 



2p{I + II) 



where 



exp 



2S + MR 



t p ~ L dt and II 



ui 6XP V. 25 + MR 

R * 3 
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We first estimate /. Since t < 3S/R, we have 



_1S 

R 



I < I e~ 1 '^ s h p - l dt = 2 p ~ 1 S p/2 I 
'o Jo 



9S 
AR 1 



r r P/2-l dr 



For 9S/(AR 2 ) < p, we have / < 2 p - 1 S p ' 2 J Q P e~ r r p l 2 ~ l dr < 2 p S p l 2 p p ' 2 - 1 . For 9S/(AR 2 ) > p, 
we have 



4R 1 



I < 2 p - l S p ' 2 / e- r r p ' 2 - l dr + / e'W^dr < 2 p S p ' 2 f' 2 - 1 + 1 



where I 2 = 2 P ~ 1 S P > 2 J p °° e~ r r p / 2 - l dr and by Lemma Ol h < 2 p S p / 2 p p / 2 ~ 1 e- p . Hence, we 
obtain 

/ < 2 p+1 S p/2 p p/2 - 1 . 
To estimate /J, since 2S < 2tR/3, we have 



oo 



// < / e- 3t ^ 4R k p ^dt 

' 3S 

R 



A \P POO 

-R / e~ r r p ~ x dr 

4R 2 

Combining all the inequalities together, we find a *llp ^ 2 p+2 S p / 2 p p ^ 2 + 2( y 4R/3) p p p+1 . 

Hence, we obtain 



i=l 



< A^/Sp + ^e 1/e Rp < A(^/Sp + Rp) . 



□ 



We remark that the constant in the above inequality is explicit and quite small, which 
may be good for numerical purpose. 



2. Large deviation principle 



Bennett's inequality is a large deviation type inequality giving an upper bound for the 
tail probability. In the commutative setting lower bounds have been analyzed intensively 
in Large Deviation Theory. Despite the fact that our arguments in the previous section 
are almost commutative, lower bounds for noncommutative random variables are very dif- 
ferent. Let us start with Cramer's Theorem. We consider a sequence of fully independent 
and identically distributed (IID) noncommutative random variables (aj)j e /. 
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Let A(A) = logr(e Aai ). Following [12] we define the Fenchel-Legendre transform of A(A) 
for x G K 

(2.1) A*(x) = sup[Ax - A(A)] . 



If {aj) is a commutative IID sequence, then Cramer's Theorem [121 Theorem 2.2.3] says 
that (a^ satisfies the Large Deviation Principle (LDP) with rate function A*, which implies 
Corollary 2.2.19] 



1 / " \ 

(2.2) lim sup — log Prob | ^^cti > nt J 



-inf A*(s) 

s>t 



The upper bound remains valid in the noncommutative setting. 

Proposition 2.1. Let (aj)j>i 6e an IID sequence in (_M,r) such that r(aj) = /or a// 
z > 1. T/ien /or any t > 0, 

1 / - \ 

lim sup — log Prob > a, > nt\ < — inf A* (s) . 

n~>J n S - J - s>t K J 

Proof. Thanks to the Golden-Thompson inequality, we can follow the proof in the com- 
mutative case in [12]. Using (11. 4p and (11. 5p . we obtain 



Prob [J2 a i> nt ] < e~ An * JJr(e Aa = e ' n(A ^ A(A)) 

\i=l / i=l 

This implies 

- log Prob \Yl a i > nt\ < -A*(t) < -inf A 



□ 



Remark 2.2. Although we assumed a^'s are in (Ai,r), using truncation and approx- 
imation, we can also prove the previous proposition for symmetric gaussians. To be 
more precise, for independent symmetric gaussian random variables a and 6, let = 
°l{|a|<v} and b N = bl^ b \ <N y. Then Monotone Convergence Theorem implies that r(e aN ) — > 
r(e a ), r(e bjv ) —> r(e b ). Since the symmetric gaussian random variable is in f] p>1 L p (A4, r), 
the triangle inequality implies t((oat + b^) p ) — > r((a + b) p ). By symmetry, we have 

r(e aN+bN ) ->• r(e a+b ) . 



In the following we give two examples which violate the LDP for noncommutative ran- 
dom variables. 
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Example 2.3 (Noncommutative semicircular law |39j). Recall that the semicircular law 
centered at a 6 1 and of radius r > is the distribution 7 ajr : C[X] — > C defined by 

JaA P ) = — 2 P(t)Vr 2 -(t-a)*dt. 

Here C[X] is the algebra of complex polynomials in one variable. 

Let us recall that copies of semicircular random variables can be constructed on the full 
Fock space. Let if be a real Hilbert space and He its complexification. Let T(if<c) be 
the full Fock space on H c . For any h G H define s(h) = 1(h) + 1(h)*, with 1(h) the left 
creation operator. Let &(H) be the von Neumann algebra generated by {s(h)\h G H}. 
Let th denote the vector state on &(H) given by the vacuum vector, 1 G T(Hq). Then 
for any orthonormal system (hi)^ C H the family of random variables (si) = s(hi) is free 
(thus fully independent) in (&(H), th) and the distribution of s(h) is the semicircular law 

70,2- 

By rotation invariance of the free functor we deduce from [39| Section 3.4] that 

1 n 

(2-3) s n = —= y~] Si ~ 7o, 2 , 

which means that the distribution of s n is 70,2- Since 70,2 is supported in [—2,2], for any 
t > 0, 



1 ( n \ 1 

lim — logProb |> Si > nt 1 = lim — logProb(s„ > \/nt) 

\i=l / 



-OO . 



On the other hand, by the integral representation of the modified Bessel function I\ [371 
(9.46)], the moment generating function of 70,2 is given by 

1 f\xt.^~,^ h(2X) 



M(X) = — I e xt V4^¥dt 

27T J_2 



X 

Using the series representation of I\ [37J (9.28)], we have for A > 0, 

M(X) = Y— > V^— = + > -e A . 

We find A (A) = logM(A) > A - log 4. Since r(ai) = 0, by [121 Lemma 2.2.5], for x > 0, 



Therefore, 



A*(x) = sup [Ax -A( A)] . 

A>0 



A*(l) = sup(A-A(A)) < log4<oo 

A>0 



12 
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which shows that the sequence (sj) violates the LDP lower bound in i \2.2L We have proved 
the following result. 

Proposition 2.4. The semicircular sequence (s„)„ e N does not satisfy LDP 112.2]) . 

The counterexample works in free probability because s\ is bounded. In order to mo- 
tivate the next example, we first clarify the relationship between the logarithmic moment 
generating function A and the rate function I of the LDP. 

Remark 2.5. Suppose that an IID sequence (a n ) satisfies the LDP with rate function J(x) 
and that A(A) is well-defined and lower semicontinuous. Then the the Fenchel-Legendre 
transform of I(x) coincides with A(A), i.e. 

J*(A) < A(A) . 

Indeed, by Holder's inequality A(A) is convex. Then Cramer's Theorem and the duality 
lemma [T2| Lemma 4.5.8] yield the assertion. In particular, if (a n ) satisfies the LDP with 
rate function / and A is lower semicontinuous, then I(x) = x 2 /2 implies A(A) = /*(A) = 
A 2 /2, i.e. the sequence (a n ) follows standard normal distribution. We have seen that in 
classical probability the distribution of an IID sequence can be recovered from the rate 
function given by the LDP provided that A is well-defined and lower semicontinuous. The 
next example will show that this is no longer true in the noncommutative setting. 

Example 2.6 (Gaussian family). For 9 e (0, 1), given a noncommutative standard gauss- 
ian random variable g (r(g ) = and T{g^) = 1) and a noncommutative semicircular 
random variable g\ ~ 70,2, one can construct (see [10]) a new sequence of IID noncommu- 
tative random variables (£j)i>i with the same distribution as gg such that for all k e N 

r{g k e) = <X-&)M) + 0t{&). 

This implies by approximation (see [T8] ) 

r(f(g e )) = (1 - 9)r(f(g )) + 9r(f( 9l )) 

for all measurable function /. In particular, for all ie8, 

(2.4) Prob(# e > x) = (1 - #)Prob(# > x) + 9Ymh(g l > x) 
and for all A Gl, 

(2.5) r(e A9e ) = (1 - fl)r(e A9 °) + 9r{e X91 ) . 
By (12. 4 1) and the invariance property ( 12. 3h . we have 

n 

Prob(^^^j > nx) = Prob(g<j > \fnx) = (1 — ^)Prob((/o > \/nx) + 6'Prob((7i > \fnx) . 
i=i 
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Therefore, we obtain the large deviation identity 

„2 



1 1 
lim — logProb(> £j > nx) = lim — logProb(go > yfno, 



- „ -/ - - - - - — - y' nx ) = 7T 

n— >oo fl *— • n— >oo J 
i=l 



On the other hand, if we put Ag(X) = \ogr(e X9e ) and let v denote the probability measure 
of semicircular law 70,2, then (12 .5ft implies 



A e (A) = \o g ((i-e)e x2 i 2 + e 



2 

At 



e M v(dt) 



2 



< log(l - 6») + y + log ( 1 + -l_ e 2 ^ 2 / 2 



and 



A e (A) > log(l - 9) + ^ + log (l + _l_ e -^A V2 



As A oo, we obtain 



(2.6) 



A 2 

Ag - — - log(l 



< e cA 

- 1-0 



where c is an absolute constant. But Ao(A) = A 2 /2 is the logarithmic moment generating 
function of standard gaussian distribution. Let us record this as follows. 

Proposition 2.7. The sequence (£ n ) n >i satisfies the LDP (12.21) with rate function I(x) = 
x 2 /2. However, the the logarithmic moment generating function o/£i differs from A (A) = 
A 2 /2 as shown in ( 12. 61) . 

We have seen that the law of (£ n ) can not be recovered from the LDP rate function. 
In view of Remark 12.51 and Proposition 12.71 we understand that LDP is a measure of 
commutativity. 



3. Improved noncommutative Rosenthal's inequality 

We prove the improved noncommutative Rosenthal inequality and show that the coef- 
ficients can not be improved in this section. In order to prove Theorem 10.41 we will follow 
and refine the standard iteration procedure given in [22J. 
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Proof of Theorem \0.4\ Instead of proving (10.51) directly, we prove the following equivalent 
inequality 



(3.1) 



n 








< D p max < 






p 





1/2 



o=i 



Vp 



1/2 



vi=i 



i/p' 



ME 



and we assume at the moment that D p is the best constant which may depend on the range 
of p. By [221 Theorem 2.1], (13. ip is true for 1 < p < 4. This is the starting point of our 
iteration argument. Assume p > 2. We only need to show u p =>■ 2p" . Let Xi G L 2p (Ai, r). 
Write the conditional expectation operator v3 = in the following proof. Put 



A := J2p 



i/s 



.2 ; 



,i=l 



and 5 := 2p ( 



V(2p) 



|2p 
l2p 



2p 



,i=l 



Using [201 Lemma 1.2] and the noncommutative Khintchine inequality in [29] with the 
right order of best constant, we have 



n 




n 






n 


1/2 


n 


1/2 1 




< 2E 




< cy^max < 




j Xj Xi 


> 






i=l 


2p 


i=l 


2p 




i=i 


V 


i=l 


,1 



where (£j) is a sequence of Rademacher random variables and E denotes the corresponding 
expectation. Let yi = x*Xi — E(x*Xi). Then 



n 




n 




n 




^ ^ Xj Xj 


< 2 max < 








j 


i=l 


p [ 


1=1 


P 


i=i 





Applying the induction hypothesis, we obtain 



i=l 



< vD p max < y^p 



1/2 



i=l 



1/p' 



HE 



Ml! 



i=l 



Note that 



E(^ 2 ) = E{\x^) - E{\ Xi \ 2 ) < E(\x^). 
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.1. ; 



i=i 



< 



p/2 



E B (i 



(p-2)/(p-l) 



E 

,i=i 



i/(p-i) 



,2p 
«H2p 



- (A 2 /2p) (p - 2)/(p - 1) ( J B/2p) 2p/(p - 1) 

= ^(2p-4)/(p-l) j g2p/(p-l)^ 2 ^-(3p-2)/(p-l) _ 

On the other hand, since E is a contraction on L p (M, r), we have 



i/p 



i=l 



.1=1 



1/p 



1/p 



< 2 E iw 



2 E 



,2p 
i\\2p 



i=l 



A=l 



This gives 



i=l 



< D p m^{^A^'^B p '^- 1 \2 P )-^- 2 ^^- 2 \ pB 2 /{2p 2 )} 
= D p m a x{2- 1 / 2 A^ 2 ^ p - 1 ^B^ p - 1 \2 P r 1 ^^ 2p - 2 \ B 2 /(2p)}. 



Hence, we find 
(3.2) 



E 

1=1 



< 2m^{2- 1 ' 2 D p A^- 2 V^B p l^- 1 \2p)- 1 - 1 ^ 2p ~ 2 \ D p B 2 /(2p), A 2 /{2p)}. 



Young's inequality for products implies 

(3.3) A (p-2)/(2p-2) 5 p/( 2 p-2) < A + B < 2m&x{A,B}. 

( P and ([33D yield 

1/2 



E 

i=i 



< max{2 1/4 v //^max{A, J B}, ^/D~ P B, A} < 2 1/4 ^/D p ~max{A, B} 



2 1/4 v //^max \ yf2p~ 



1/2 



i=l 



V(2p) ' 



ME 



|2p 
l2p 



2p 



i=l 
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Here we assumed D„ > 2 x l 2 without loss of generality. Applying the same argument to 



XiX*, we obtain 



i=l 



1/2 



< 2 1/4 v //^max 



1/2 



i=l 



ME 



V(ap) ' 



,2p 
i|l2p 



2p 



J=l 



Hence, (13. ip is true for 2p with constant 2 1//4 cy / Z^. It follows that 

d 2 „ < 2 1 ^^fa n 



'2p 

and thus D p < \/2c 2 which is independent of p. Therefore, the iteration argument is done 
and we have proved the first assertion. As mentioned in the introduction of this paper, 
the interpolation argument from [221 section 4] shows that the first assertion improves to 
the second assertion with a singularity as p tends to 2. Thus for p > 2.5 the assertion 
holds with an absolute constant. □ 

Remark 3.1. The improved Rosenthal inequality allows us to extend Lust-Piquard's 
non-commutative Khintchine inequality [2~4"||25] in a twisted setting. We refer to [9] for 
unexplained notion on the gaussian measure space construction. The starting point is 
a discrete group acting on a real Hilbert space H. This means we fix an isometry b : 
H — > L 2 (fl, £,a0 such that b is linear and b(h) is a centered gaussian random variable 
with variance \\h\\ 2 . For example for H = L 2 (0, oo) and B t = 6(l[ 0)t ]) we recover a well- 
known method to construct Brownian motion. We may assume that S is the minimal 
sigma algebra generated by the random variables b(H). Then the action of G extends to 
a family of measure preserving automorphism a : G — > Aut(L oc (f2, £,//)) such that 

a g (b(h)) = b(g.h) . 

This allows us to form the crossed product M = L OQ (E) x G. The crossed product is 
spanned by random variables of the form 



x 



Here \(g) refers to the regular representation of group. The algebraic structure is de- 
termined by ^(g)fX(g^ 1 ) = oc g (f). The twisted gaussian random variables are of the 
form 



B = ^b(h g )\(g) , h g EH. 



In order to formulate the Khintchine inequality we have to recall that there exists trace 
preserving conditional expectation E : M — >■ L(G). Here L(G) is the von Neumann 
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subalgebra generated by the image X(G) and the trace is given by 

r\J2f 9 K9)j = J fidl*. 

Then we can deduce from Theorem 10.41 that for p > 2 

(3.4) ||B|| p < Cv ^\\E(B*B + BB*) 1/2 \\ p . 

Moreover, the span of the generalized gaussian random variables is complemented and 
the inequality remains true with additional vector valued coefficients. This is a key fact 
in proving noncommutative Riesz transforms. To illustrate (13.4j) let us assume that the 
action is trivial. Let (e^) be a basis and 

B = ^2a(k,g)b(e k ) <g> X(g) = y^6(e fc ) ® a k ■ 

k,g k 

Then we find 

E(BB*) = J2^a* k , E{B*B) = Y J a W- 

k k 

Thus the right hand side gives exactly the square function we expect for gaussian variables. 
However, with non-trivial additional group action BB* and B*B look quite different and 
the group action interferes significantly. 



Using (I0.6p . we can prove Corollary 10.51 which will play a central role in the application 
to compressed sensing in the next section. 

Proof of Corollary 10.51 By Jensen's inequality, we have 




where (S^) is a sequence of independent selectors with the same distribution as 5,'s. In 
order to apply Theorem I0.4[ it is crucial to choose appropriate probability space. Let 
(O, J 7 , P) be the probability space generated by (Si, S'j). We consider the noncommutative 
probability space as the algebra M. = I ro (P) <g> M . Then we have a normalized trace 
f = E g) r on M.. We identify E as the conditional expectation E : M. — > N '. Clearly, 
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((Si — S'^Xi)^ are fully independent over J\f. Note that 

E(S i -S' i ) 2 = —(l--) < — and sup \S, - S'\ < 1. 
m \ m J m i=i,...,m 

Since Xi is positive, x*Xi = x 2 . Using (10. 6p . we obtain 





m 


\ VP 


m 




|e 


^2(Si - S'i)xi 




^2(Si - S[)xi 






i=i 


L p (Af,r)J 




L p (M,f) 



< C max < ^/p 



i=l 



1/2 



V 2 (Af,r) 



, p|| SUp \Si - S'^XiWL^Mf) 
i=l,...,m 



Since r(l) = 1 and a;, < r, we obtain — S' i \xi\\L p (M,f;i oc ) — r ; an d 



i=l 

Therefore, we find 
E 



1 m 



•1/ 



i=l 



< 2A;r 



i/p 



1 m 

i=i 



.1 



2fcr . 



< C max 



L v {M,t). 



2pr pr 

IT' T 



We have completed the proof of HO. 7[) with constant v^C- For the "moreover" part, we 
use the additional norm assumption and obtain 



i=l 



X,; - 1 



< 



Loo(tr) 



1 m 



;Xi 1 



L p (tr) 



Then by Chebyshev's inequality and (10. 7p for trace r(x) = tr(x)/tr(l), we have 



P 



1 m 



X,; - 1 



i=l 



> te < (te)~ p E 



Loo (tr) 



iE«-« 



i=l 



< tr(l) max 



p 



C 2 pr Cpr ) 
kt 2 e 2, lte~j ' 



Let us first assume te < C. Optimize the first term in p and find p = t 2 e 2 k/(C 2 re). Recall 
that k = re" 2 . Then the first term becomes e~* ^ 2C e \ Using te < C, this choice of p gives 
an upper bound of e~ l e > for the second term. Now assume te > C. The optimal choice 
for the second term is obtained for p = kte / (Cre). Then the second term becomes e~ f /( Ce£ ) 
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and, thanks to te > C, the first term is less than e t /( 2Ce£ ) . The additional assumption on 
t guarantees that p > 2.5 in both cases. Therefore, 



P 



i J2 5iXi 



i=l 



> te 


<*r(l)| 


Loo(tr) ) 





e -* 2 /(2C 2 e) if f£ < ^ 
e -t/(2Cee) jf fe > ^ 



The constant C is the same as the constant in the first assertion. □ 

Remark 3.2. In this context it is useful to compare our different generalizations of Rosen- 
thal's inequality. We observe that with Corollary 10.31 we can only obtain 



E 



i=l 



and with inequality ( 10. 5 j) we obtain 

- m 



E 



i=i 



i/p 



k k 1 - 1 ^ J ' 



Both estimates are worse than inequality (10. T§ . 



The following two examples are meant to justify the optimality of ^fp and p. We 
refer the reader to [28] for a more detailed discussion on this topic in the framework of 
classical probability. We will use the standard notations for comparing orders of functions 
as p — > oo. Recall that f(p) = 0(g(p)) if there exists a constant C such that f(p) < 
Cg(p) asymptotically, f(p) = Q(g(p)) if there exists a constant c such that f(p) > cg(p) 
asymptotically, f(p) = Q(g(p)) if there exist constants c and C such that cg(p) < f(p) < 
Cg(p) asymptotically, and f(p) ~ g(p) if limp^oo f(p)/g(p) = l. 

Example 3.3 (The optimality of yjp in Theorem 10 .4[) . Let us assume that 



(3.5) 



i=l 



< A(p) 



,1=1 




for some functions A(p) and B(p). We use X\ = g{. Here (gi) is a sequence of IID normal 



random variables with mean and variance 1. We know E|gi| p — 
formula, we obtain for large p, 



2 p/2 TVP+ lN 



By Stirling's 
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This yields that there exist absolute constants c and C such that c^/p < \\gi\\ < C^fp for 
all p > 2. Hence, we obtain 



n 

— T 



< A(p) + CB(p)y/pnp 



i i 

2 



cy/p < \\gi\\ p - 
Sending n — > oo, we have 

A(p) > Cy/p for p > 2. 

This shows that one can not reduce the order of A(p) even at the expense of increasing 
the order of B(p). 

Example 3.4 (The optimality of p in Theorem 10.41) . Following Corollary 10. 5[ we do a 
random selector on Q = {!}, i.e. X{ = 1 and E£j = A = k/m then we shall assume that 



^ r re , f(p) 



for some function f(p). Here we choose m = p and k = ap for some very small a. Then 
we find that for every 1 < j < m 



A i/-(l_ A )i-i/- < C M+ f -^- 

V k k 



Let us first fix j = [7m] and assume that 7 > 1/4 and l/2 m < a < 1/8. This gives 



f > - > 7- > 2 and hence 

k — a — 4a — 



- 1 



> 



Note that 1 < J^ 1 /™ - < 2 so that we can not expect any help here. Thus we find 

— a^il-a) 1 -* < I^-i+i/ m(1 _ a )i-7 < Ca -i/2 + IM. . 
16 8 ap 

Let us now fix 7 = 1 /4 and choose a such that 



^ i/o 1 / 1 — a 



16 \ a 



3/4 



or equivalently 

32CV /4 < (1-a 
However a < 1/8 implies 1 — a > 7/8. Thus 



3/4 



a < 



8y (32C) 4 
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will do. Then we find 

,*,«. 

Choose a = (7/8) 3 /(32C) 4 . Then we have 

(3-6) < f(p) 

for an absolute constant Cq = (7/8) 3 / 2 /32 2 . This shows that one can not reduce the order 
of f{p), as long as we keep A(p) < C-^Jp in (13. 5p . 

Remark 3.5. In fact, Example 13.41 provides more information. Instead of fixing 7, by 
sending 7 — > and choosing a < 7/2 appropriately we can find a different behavior. 
Indeed, then we have \j/k — 1| > J /(2a) and 

4 ap 
and since a < 7 and (1 — 7) 1-7 > e _1 we need 8eCa -1 / 2 < 7a 7-1 or 

a 2 7 < 



8eC ' 

Note that (g^) 2/{1 ~ 27) < 7/2 for 7 < 1. Hence with 
we have 

s-p < f(p) ■ 



Put a = (g^) 2/(1 ~ 27) . Then we obtain 



" V " 7 Cp</(p). 



Optimizing the left hand side in 7, we obtain 27 log(8e 2 C) — 27 log(7) = 1 and 



( 16eCkg 8 -fy^c P <n P , 



Since 7 log 7 — > as 7 — > 0, we choose 

1 

7 ~ 21og(8e 2 C)' 

In order to obtain a lower bound for /(p), we need to assume 8C > 1 so that 7 < 1/4. 
This yields for C > 1.5, 

^ ^ ~ 32 v / 2e 3 / 2 + 2 / e log(8e 2 C) P ~ c x logC 
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for some absolute constant c\. Compare ( 13. 7ft with ( 13. 61) . The estimate (13. 7p is better 
for large C. Let us now fix p and put C = p a . Example 13.31 shows that a has to be 
nonnegative. ( 13. 7\\ implies that for a > 

c\<a \ogp 

In particular, for C = y/p/logp, we obtain f{p) > 2q 1 p/logp, which recovers the best 
constants obtained in [TT] . 



Example 13.31 and Remark 13.51 yield the following result. 



Theorem 3.6. Under the hypotheses of Theorem\0.4\ assume that 



i=l 



< A(p) 



1/2 



1/p 



/or some functions A(p) and B(p). Then we have 



i) The best possible order of the lower bound for A(p) is ^Jp, which can not be improved 
even if the order of B(p) is increased. 

ii) If Q(p/ logp) = A(p) = 0{p 13 ) where > 1, then the best order of B(p) isp/logp. 



In the commutative case, i) was proved by Pinelis and Utev in [28]. To the best of 
our knowledge, ii) is new even in the commutative setting, which shows that the order 
p/logp of B(p) can not be reduced at the expense of increasing A(p) to any polynomial 
order. It was proved in [28] that if one reduces B(p) to a constant then the best order of 
A(p) is exponential in p and in this case B(p) can not be improved even at the expense of 
increasing the order of A(p). Unfortunately, it is still unclear what is the optimal choice of 
B(p) for fixed Q(^/p) = A(p) = 0(p/ logp). All we know is that fl(p/ logp) = B(p) = 0(p) 
and pj \ogp is sharp when A(p) = Q(p/ logp). 



4. Application to compressed sensing 



In this section, we indicate how our Rosenthal type inequality applies to certain problems 
in Compressed Sensing Theory. Let us briefly recall the background here following [61IH1I35] . 
We want to reconstruct an unknown signal / 6 C™ from linear measurements G C fe , 
where $ is some known k x n matrix called the measurement matrix. The reconstruction 
problem is stated as 

(4.1) min||/*|| subject to $/* = $/ 
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where ||/||o — |supp/| is the number of nonzero element of /. Since this problem is 
computationally expensive, we consider its convex relaxation instead: 

(4.2) min ||/*||i subject to $/* = $/ 

where ||/|| p = (Xh=i \fj\ P Y^ P denotes l p norm throughout this section. Exact reconstruc- 
tion means that the solutions to ( 14.1 ft and (14.21) are both equal to /. / is assumed to be 
s-sparse, i.e. |supp/| < s. We refer to [HIES] for why (14. 21) is a good substitute of (14.11) . 
However, the restricted isometry property(RIP) on $ is an extremely important tool for 
exact reconstruction due to Candes and Tao [7] (see also [5]). Let <3>t denote the k x \T\ 
matrix consisting of the columns of $ indexed by T. The RIP constant A s is defined to 
be the smallest positive number such that the inequality 

C(l - A s )\\x\\l < \\Q T x\\l < C(l + A s )\\x\\t 

holds for some number C > and for all x G h and all subsets T C {1, . . . ,n} of size 
\T\ < s. Candes and Tao proved the following theorem [5j[7]: 

Theorem 4.1. Let f be an s-sparse signal and $ be a measurement matrix whose RIP 
constant satisfies 

A 3s + 3A 4s < 2. 

Then f can be recovered exactly. 



Since A s is nondecreasing in s, in order to verify RIP, it suffices to show that 

A 4i < \ 

or simply A s < ~ by adjusting constant if necessary. In this section, we apply Corollary 
10.51 to study the problem of reconstruction from Fourier measurements. Two cases will 
be considered. In the first case we fix the support T of /. In the second case we allow it 
to vary. In the following, C will always denote the constant in Corollary 10.51 and C m will 
always denote the m-dimensional complex Euclidean space equipped with I2 norm. 

Example 4.2 (Fourier measurements). We consider the Discrete Fourier transform / = 
tyf where \& is a matrix with entries 

* W)t = ir 12 ™*/", w,te{0,...,n-l}. 

We want to reconstruct an s-sparse signal / 6 C™ from linear measurements $/ G C Q , 
where Q C {0, . . . , n — 1} is a uniformly random subset with average cardinality k and 
the measurement matrix $ is a submatrix of \& consisting of random rows with indices 
in Q. This is the Fourier measurement matrix considered in (6J[H1|35]. We can formulate 
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this random subset precisely using the Bernoulli model. Let (5j)" =0 be a sequence of 
independent selectors with E5j = k/n, for i = 0, . . . ,n — 1. Then 



Q = {j: 6, = 1} 



and k = ElOl. 



Let Hi be the 2-th row of ^ and T the support of /. Write yf for the restriction of yi on 
the coordinate in the set T . For x,y,z e C n , we define the tensor x ® y as the rank-one 
linear operator given by (x <S> y){z) = (x, z)y. Then 



n-1 



$*$ = ^yf®yf = ^5 iy J®yJ. 



i=0 



Let = nyj" (g) yj. Then 

n-1 



-y^ftj = id c r = I T and ||:r,-|| = n||yj (8) = n|||/J||2 < s. 

71 i=0 

The next proposition follows easily from Corollary 10.51 

Proposition 4.3. Assume that the average cardinality of a random set Q is k = e~ 2 s. 
Then for te < C , 



(4.3) 



P 



n-1 



C T 



j=0 



> te < se 



-t 2 /(2C 2 e) 



where || • || t/ie operator norm. 



Define 

# : = irf cT _ ^ ^ T g) yf. 
I I t=0 

Then $*$ = — (iy— _ff). By the classical Bernstein inequality, k/2 < \Q\ < 3k/2 with high 
probability, see [8] Lemma 6.6]. Therefore, by choosing te < 1, we find that the matrix 
It — H is invertible with high probability. The precise meaning of "high probability" 
will become clear in a moment. This proposition is an analog of [HI Theorem 3.1] and 
[3"o"| Theorem 3.3] with a single set T. We compare our results with previous results in 
the following remark. It is easy to show that P(/c/2 < < 3k/ 2) given by Bernstein's 
inequality dominates 1 — se~ l '^ 2C e ) for the value of k given below. Hence we only need 
to consider ( 14.31) for the probability of success. 
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Remark 4.4. i) For a single set T our result is more general than previous results 
on the invertibility of $*$ obtained by Candes, Romberg, and Tao in the break- 
through paper [6]. In particular, if we put te = 1/2 and e~ 2 = 8C 2 e(Mlogn+logs) 
for some M > 0, then we obtain k = cms logn for some constant cm and It — H is 
invertible with probability at least 1 — 0{n~ M ). This gives [6j Theorem 3.1]. To- 
gether with (HI Lemma 2.3], or following verbatim the end of the proof of [33, The- 
orem 4.2 (section 7.3)], we recover the main results of [6]. 
ii) Allowing arbitrary choices of k and p we recover [33| Theorem 7.3], and we would 
like to thank H. Rauhut for bringing this to our attention. His proof requires 
considerably more technology. Both proofs are based on the optimal constant 
in the noncommutative Khintchine inequality (used in Rudelson's lemma) which 
was discovered independently by the first named author and Pisier (see [30] for 
more historic comments). We believe that our proof is more direct. Moreover, 
he established the exact reconstruction results based on his version of (14. 3D cited 
above, which shows that an estimate like (14. 3B is the key to the exact reconstruction 
problem. 



We now investigate the case with multiple choices of T. First, it is clear that (14. 3p 
remains valid for polynomially many sets T. In general, we have 

n-l 



(4.4) 



P I sup 

,m< s 



> te < \S\se~^ 



where \S\ denotes the number of set T with |T| < s. Note that 



inf sup 



o>0 



\T\<b 



a 



ten 



It follows that 



P(A S > te) < P ( sup 

AT\<s 



n-l 



i=0 



> te 



Assume s < n/2. Since \S\ < s( n ) + 1 < s(ne/s) s , if 



(4.5) 



ne 



t 1 



21og S + S log- < 2(J2e , 



then with probability at least 1 — s 2 (ne/s) s e~' 2 ^ 2C ' 2e - ) we can recover all s-sparse signal / 
from its Fourier measurements $/. From here we are able to obtain different bounds for 
k and the corresponding probabilities of success. As an illustration, we have the following 
result. 
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Proposition 4.5. Assume s < n/2. Let M > be a precision constant and n be a large 
integer such that 

ne n 
2 log s + slog — < (M + l)slog-. 

s s 

Then a random subset Q of average cardinality 

71 71 

(4.6) k = 8C 2 e(M + l)s 2 log- = c M s 2 log - 

s s 

satisfies RIP with probability at least 1 — s 2 e s (n/s)~ . 

Proof. Put te = 1/2 in (J4T4J) . Since k = se~ 2 , we obtain t 2 = 2e(M + l)s log(n/s). Thanks 
to the assumption on n, (14.5j) is true. Then 



p(a,>1) < s v ( =) 



n s -Ms 



We have proved the assertion. □ 

Remark 4.6. We can relax the bound for k a little to obtain polynomial probability of 
success. Indeed, the same argument as Proposition 14.51 yields that a random subset Q of 
average cardinality 

(4.7) k = 8C 2 e(M + l)s 2 logn = c M s 2 \ogn 

satisfies RIP with probability 1 — s 2 ~ s e s n~ Ms . 

The good aspect of Proposition 14.51 is that k is linear in log n. Unfortunately, this is 
weaker than Rudelson and Vershynin's results in [33] k = 0(s lognlog(s logn) log 2 s) for 
fixed probability 1—e of success, which was strengthened to super-polynomially probability 
of success by Rauhut following their ideas, see [33]. These results are obtained by using 
deep Banach spaces techniques. We added our results just for comparison. Of course, 
simple applications of Khintchine's inequality are not expected to replace either majorizing 
measure techniques or the iterative methods of [33] for the uniform estimates required for 
RIP. It seems known in the compressed sensing community that the tails bounds alone 
are not good enough. To conclude this section, we restate a conjecture on the best bound 
of k, see [33] (and [33J for further background). 

Conjecture 4.7. A random subset Q C {0, l,...,n — 1} of average cardinality k = 
0(s logn) satisfies RIP with high probability. 

Acknowledgements We would like to thank W. B. Johnson for bringing [28] to our 
attention. We also thank K. Lee for helpful discussions on compressed sensing. After our 
work was completed, we learned from S. Dirksen that he also essentially obtained ( 10. 51) in 



NONCOMMUTATIVE BENNETT AND ROSENTHAL INEQUALITIES 



27 



his Ph.D. Thesis [33] using a different method in the UIUC analysis seminar on November 
3, 2011. We are also grateful to H. Rauhut for his comments on section 4 after he read 
our paper from arxiv.org, which have improved our statements. 



References 

[I] R. Ahlswede and A. Winter, Strong converse for identification via quantum channels, Information 
Theory, IEEE Transactions on 48 (2002mar), no. 3, 569 -579. H 

[2] H. Araki, Golden- Thompson and Peierls-Bogolubov inequalities for a general von Neumann algebra, 
Comm. Math. Phys. 34 (1973), 167-178. MR0341114 (49 #5864) 1H 

[3] G. Bennett, Probability inequalities for the sum of independent random variables, Journal of the 
American Statistical Association 57 (1962), no. 297, 33-45. HUE] 

[4] D. L. Burkholder, Distribution function inequalities for martingales, Ann. Probability 1 (1973), 19— 
42. MR0365692 (51 #1944) 10] 

[5] E. Candes, M. Rudelson, T. Tao, and R. Vershynin, Error correction via linear programming, Foun- 
dations of computer science, 2005. foes 2005. 46th annual ieee symposium on, 2005oct., pp. 668 -681. 

ma 

[6] E. J. Candes, J. Romberg, and T. Tao, Robust uncertainty principles: exact signal reconstruction 
from highly incomplete frequency information, IEEE Trans. Inform. Theory 52 (2006), no. 2, 489- 
509. MR2236170 (2007e:94020) 10 EU EH [Ml EH 

[7] E. J. Candes and T. Tao, Decoding by linear programming, IEEE Trans. Inform. Theory 51 (2005), 
no. 12, 4203-4215. MR2243152 (2007b:94313) US] 

[8] , Near-optimal signal recovery from random projections: universal encoding strategies?, IEEE 

Trans. Inform. Theory 52 (2006), no. 12, 5406-5425. MR2300700 (2008c:94009) HHHSHH 

[9] P.-A. Cherix, M. Cowling, P. Jolissaint, P. Julg, and A. Valette, Groups with the Haagerup prop- 
erty, Progress in Mathematics, vol. 197, Birkhauser Verlag, Basel, 2001. Gromov's a-T-menability. 
MR1852148 (2002h:22007) 101 

[10] B. Collins and M. Junge, What is a noncommutative Brownian motion?, Preprint (2011). 1 1121 

[II] J. B. Conway, A course in functional analysis, Second, Graduate Texts in Mathematics, vol. 96, 
Springer- Verlag, New York, 1990. MR1070713 (91e:46001) HI 

[12] A. Dembo and O. Zcitouni, Large deviations techniques and applications, Second, Applications of 
Mathematics (New York), vol. 38, Springer- Verlag, New York, 1998. MR1619036 (99d:60030) 1(131 

MM 

[13] S. Dirksen, Noncommutative and Vector-valued Rosenthal Inequalities, Preprint, 2011. Thesis 

(Ph.D.)-Delft University of Technology. ttSZl 
[14] T. Fack and H. Kosaki, Generalized s-numbers of r -measurable operators, Pacific J. Math. 123 (1986), 

no. 2, 269-300. MR840845 (87h:46122) 1E2 
[15] D. Gross, Recovering low-rank matrices from few coefficients in any basis, IEEE Trans. Inform. Theory 

57 (2011march), no. 3, 1548 -1566. MM 
[16] P. Hitczenko, Best constants in martingale version of Rosenthal's inequality, Ann. Probab. 18 (1990), 

no. 4, 1656-1668. MR1071816 (92a:60048) 1Q] 



28 MARIUS JUNGE AND QIANG ZENG 

[17] W. B. Johnson, G. Schechtman, and J. Zinn, Best constants in moment inequalities for linear combi- 
nations of independent and exchangeable random variables, Ann. Probab. 13 (1985), no. 1, 234-253. 

MR770640 (86i:60054) fQI2J|22] 
[18] M. Junge, Operator spaces and Araki-Woods factors: a quantum probabilistic approach, IMRP Int. 

Math. Res. Pap. (2006), Art. ID 76978, 87. MR2268491 (2009k:46118) 1QJ 
[19] M. Junge, Doob's inequality for non- commutative martingales, J. Reine Angew. Math. 549 (2002), 

149-190. MR1916654 (2003k:46097) fH 
[20] M. Junge and Q. Xu, Noncommutative Burkholder/ Rosenthal inequalities, Ann. Probab. 31 (2003), 

no. 2, 948-995. MR1964955 (2004f:46078) tHIIlEE] 
[21] , On the best constants in some non- commutative martingale inequalities, Bull. London Math. 

Soc. 37 (2005), no. 2, 243-253. MR2119024 (2005k:46170) fD 
[22] , Noncommutative Burkholder /Rosenthal inequalities. II. Applications, Israel J. Math. 167 

(2008), 227-282. MR2448025 (2010c:46141) 1[2J [5J QH HI EH 
[23] S. Kwapieh and W. A. Woyczyhski, Tangent sequences of random variables: basic inequalities 

and their applications, Almost everywhere convergence (Columbus, OH, 1988), 1989, pp. 237-265. 

MR1035249 (91c:60020) 1ED 
[24] F. Lust-Piquard, Inegalites de Khintchine dans C p (1 < p < oo), C. R. Acad. Sci. Paris Ser. I Math. 

303 (1986), no. 7, 289-292. MR859804 (87j:47032) HH 
[25] F. Lust-Piquard and G. Pisier, Noncommutative Khintchine and Paley inequalities, Ark. Mat. 29 

(1991), no. 2, 241-260. MR1150376 (94b:46011) fH 
[26] S. V. Nagaev and I. F. Pinelis, Some inequalities for the distribution of sums of independent random 

variables, Theory of Probability and its Applications 22 (1978), no. 2, 248-256. H 
[27] D. Petz, A survey of certain trace inequalities, Functional analysis and operator theory (Warsaw, 

1992), 1994, pp. 287-298. MR1285615 (95c:15038) H 
[28] I. F. Pinelis and S. A. Utev, Estimates of the moments of sums of independent random variables, 

Thcor. Probability Appl. 29 (1985), no. 3, 574-577. ItHl [22l [27] 
[29] G. Pisier, Non- commutative vector valued L p -spaces and completely p- summing maps, Astcrisquc 247 

(1998), vi+131. MR1648908 (2000a:46108) HHEES 
[30] , Introduction to operator space theory, London Mathematical Society Lecture Note Series, 

vol. 294, Cambridge University Press, Cambridge, 2003. MR2006539 (2004k:46097) 1H>] 
[31] G. Pisier and Q. Xu, Non-commutative L p -spaces, Handbook of the geometry of Banach spaces, Vol. 

2, 2003, pp. 1459-1517. MR1999201 (2004i:46095) fH 
[32] N. Randrianantoanina, Conditioned square functions for noncommutative martingales, Ann. Probab. 

35 (2007), no. 3, 1039-1070. MR2319715 (2009d:46112) fj] 
[33] H. Rauhut, Compressive sensing and structured random matrices, Theoretical foundations and nu- 
merical methods for sparse recovery, 2010, pp. 1-92. MR2731597 1f25] l26l 
[34] H. P. Rosenthal, On the subspaces of IP (p > 2) spanned by sequences of independent random vari- 
ables, Israel J. Math. 8 (1970), 273-303. MR0271721 (42 #6602) fT] 
[35] M. Rudelson and R. Vershynin, On sparse reconstruction from Fourier and Gaussian measurements, 

Comm. Pure Appl. Math. 61 (2008), no. 8, 1025-1045. MR2417886 (2009e:94034) 1J22] EH EH [26] 
[36] M. B. Ruskai, Inequalities for traces on von Neumann algebras, Comm. Math. Phys. 26 (1972), 280- 

289. MR0312284 (47 #846) H 



NONCOMMUTATIVE BENNETT AND ROSENTHAL INEQUALITIES 



29 



[37] N. M. Temme, Special functions, A Wiley-Interscience Publication, John Wiley & Sons Inc., New 
York, 1996. An introduction to the classical functions of mathematical physics. MR1376370 
(97e:33002) 1fTI] 

[38] J. A. Tropp, User-friendly tail bounds for sums of random matrices, ArXiv e-prints (April 2010), 

available at 11004.43891 1i6] 
[39] D. V. Voiculescu, K. J. Dykema, and A. Nica, Free random variables, CRM Monograph Scries, vol. 1, 

American Mathematical Society, Providence, RI, 1992. A noncommutative probability approach to 

free products with applications to random matrices, operator algebras and harmonic analysis on free 

groups. MR1217253 (94c:46133) fCD 

Department of Mathematics, University of Illinois, Urbana, IL 61801 
E-mail address: jungeOmath . uiuc . edu 

Department of Mathematics, University of Illinois, Urbana, IL 61801 



E-mail address: zeng8@illinois.edu 



